2,634 research outputs found

    Exploratory Analysis of Benchmark Experiments -- An Interactive Approach

    Get PDF
    The analysis of benchmark experiments consists in a large part of exploratory methods, especially visualizations. In Eugster et al. [2008] we presented a comprehensive toolbox including the bench plot. This plot visualizes the behavior of the algorithms on the individual drawn learning and test samples according to specific performance measures. In this paper we show an interactive version of the bench plot can easily uncover details and relations unseen with the static version

    Bench Plot and Mixed Effects Models: First steps toward a comprehensive benchmark analysis toolbox

    Get PDF
    Benchmark experiments produce data in a very specific format. The observations are drawn from the performance distributions of the candidate algorithms on resampled data sets. In this paper we introduce new visualisation techniques and show how formal test procedures can be used to evaluate the results. This is the first step towards a comprehensive toolbox of exploratory and inferential analysis methods for benchmark experiments

    Spider-Man, the Child and the Trickster -- Archetypal Analysis in R

    Get PDF
    Archetypal analysis has the aim to represent observations in a multivariate data set as convex combinations of extremal points. This approach was introduced by Cutler and Breiman (1994); they defined the concrete problem, laid out the theoretical foundations and presented an algorithm written in Fortran, which is available on request. In this paper we present the R package archetypes which is available on the Comprehensive R Archive Network. The package provides an implementation of the archetypal analysis algorithm within R and different exploratory tools to analyze the algorithm during its execution and its final result. The application of the package is demonstrated on two examples

    Probabilistic Archetypal Analysis

    Full text link
    Archetypal analysis represents a set of observations as convex combinations of pure patterns, or archetypes. The original geometric formulation of finding archetypes by approximating the convex hull of the observations assumes them to be real valued. This, unfortunately, is not compatible with many practical situations. In this paper we revisit archetypal analysis from the basic principles, and propose a probabilistic framework that accommodates other observation types such as integers, binary, and probability vectors. We corroborate the proposed methodology with convincing real-world applications on finding archetypal winter tourists based on binary survey data, archetypal disaster-affected countries based on disaster count data, and document archetypes based on term-frequency data. We also present an appropriate visualization tool to summarize archetypal analysis solution better.Comment: 24 pages; added literature review and visualizatio

    Weighted and Robust Archetypal Analysis

    Get PDF
    Archetypal analysis represents observations in a multivariate data set as convex combinations of a few extremal points lying on the boundary of the convex hull. Data points which vary from the majority have great influence on the solution; in fact one outlier can break down the archetype solution. This paper adapts the original algorithm to be a robust M-estimator and presents an iteratively reweighted least squares fitting algorithm. As required first step, the weighted archetypal problem is formulated and solved. The algorithm is demonstrated using both an artificial and a real world example

    Measuring Concentration in Data with an Exogenous Order

    Get PDF
    Concentration measures order the statistical units under observation according to their market share. However, there are situations where an order according to an exogenous variable is more appropriate or even required. The present article introduces a generalized definition of market concentration and defines a corresponding concentration measure. It is shown that this generalized concept of market concentration satisfies the common axioms of (classical) concentration measures. In an application example, the proposed approach is compared with classical concentration measures; the data are transfer spendings of German Bundesliga soccer teams, the ``obvious'' exogenous order of the teams is the league ranking

    From Spider-Man to Hero - Archetypal Analysis in R

    Get PDF
    Archetypal analysis has the aim to represent observations in a multivariate data set as convex combinations of extremal points. This approach was introduced by Cutler and Breiman (1994); they defined the concrete problem, laid out the theoretical foundations and presented an algorithm written in Fortran. In this paper we present the R package archetypes which is available on the Comprehensive R Archive Network. The package provides an implementation of the archetypal analysis algorithm within R and different exploratory tools to analyze the algorithm during its execution and its final result. The application of the package is demonstrated on two examples.

    Having the Second Leg At Home - Advantage in the UEFA Champions League Knockout Phase?

    Get PDF
    In soccer knockout ties which are played in a two-legged format the team having the return match at home is usually seen as advantaged. For checking this common belief, we analyzed matches of the UEFA Champions League knockout phase since 1995. It is shown that the observed differences in frequencies of winning between teams first playing away and those which are first playing at home can be completely explained by their performances on the group stage and - more importantly - by the teams' general strength

    (Psycho-)Analysis of Benchmark Experiments

    Get PDF
    It is common knowledge that certain characteristics of data sets -- such as linear separability or sample size -- determine the performance of learning algorithms. In this paper we propose a formal framework for investigations on this relationship. The framework combines three, in their respective scientific discipline well-established, methods. Benchmark experiments are the method of choice in machine and statistical learning to compare algorithms with respect to a certain performance measure on particular data sets. To realize the interaction between data sets and algorithms, the data sets are characterized using statistical and information-theoretic measures; a common approach in the field of meta learning to decide which algorithms are suited to particular data sets. Finally, the performance ranking of algorithms on groups of data sets with similar characteristics is determined by means of recursively partitioning Bradley-Terry models, that are commonly used in psychology to study the preferences of human subjects. The result is a tree with splits in data set characteristics which significantly change the performances of the algorithms. The main advantage is the automatic detection of these important characteristics. The framework is introduced using a simple artificial example. Its real-word usage is demonstrated by means of an application example consisting of thirteen well-known data sets and six common learning algorithms. All resources to replicate the examples are available online

    Exploratory and Inferential Analysis of Benchmark Experiments

    Get PDF
    Benchmark experiments produce data in a very specific format. The observations are drawn from the performance distributions of the candidate algorithms on resampled data sets. In this paper we introduce a comprehensive toolbox of exploratory and inferential analysis methods for benchmark experiments based on one or more data sets. We present new visualization techniques, show how formal non-parametric and parametric test procedures can be used to evaluate the results, and, finally, how to sum up to a statistically correct overall order of the candidate algorithms
    corecore